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, Abstract 



We consider the distribution of a graph invariant of central similarity proximity catch digraphs (PCDs) 
based on one dimensional data. The central similarity PCDs are also a special type of parameterized 
random digraph family defined with two parameters, a centrality parameter and an expansion parameter, 
and for one dimensional data, central similarity PCDs can also be viewed as a type of interval catch 
digraphs. The graph invariant we consider is the relative density of central similarity PCDs. We prove 
\ that relative density of central similarity PCDs is a [/ -statistic and obtain the asymptotic normality under 

^■f^ . mild regularity conditions using the central limit theory of [/-statistics. For one dimensional uniform 

data, we provide the asymptotic distribution of the relative density of the central similarity PCDs for the 
entire ranges of centrality and expansion parameters. Consequently, we determine the optimal parameter 
values at which the rate of convergence (to normality) is fastest. We also provide the connection with 
class cover catch digraphs and the extension of central similarity PCDs to higher dimensions. 



Keywords: asymptotic normality; class cover catch digraph; intersection digraph; interval catch digraph; 
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1 Introduction 



Proximity catch digraphs (PCDs) are introduced recently and have applications in spatial data analysis and 
statistical patter n classification. The PCDs are a special type of proximity graphs whic h were introduced by 
Toussaint Furthermore, the PCDs are closely related to the class cover problem of lCannon and CowenI 



(l2000h . The PCDs are vertex-random digraphs in which each vertex corresponds to a data point, and directed 



edges (i.e., arcs) are defined by some bivariate relation on the data using the regions based on these data 
points. 



'Address: Department of Mathematics, Kog University, 34450 Sariycr, Istanbul, Turkey, e-mail: elceyhan@ku.edu.tr, tel:+90 
(212) 338-1845, fax: +90 (212) 338-1559. 
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Priebe et al.l (|200l[) introduced the class cover catch digraphs (CCCDs) in M which is a special type of 



PCDs and gave the exact and the asymptotic distribution of the domination number of the CCCDs based on 



data f r om two classes, say X and y, with uniform distribu t ion on a bounded int erval in R. iDeVinnev et al 



(|2002{) . lMarchette and Priebd (|2003l ) . iPriebe et all (|2003al ) . iPriebe et al.1 (|2003bf ). and lDeVinnev and Priebe 
(j2006l ) applied th e concept in higher dimen sions and demonstrated relatively good performance of CCCDs 



in classification. ICevhan and Priebd ()2003() introduced central similarity PCDs for two di mensional data 



in an unparameterized fashion; the parameterized version of this PCD is later developed by ICevhan et al 



(|2007r) where the relative d e nsity of the PCD is calculate d and used for testing bivariate spatial patterns 
in R2 ICevhan and Priebd (|2005l l2007t) . ICevhanI (|2011bl) applied the same concept (for a different PCD 
family called proportional-edge PCD) in testing spatial point patterns in . The distri bution of the rela tive 
density of the proportional-edge PCDs for one dimensional uniform data is provided in ICevhanI (l2011al) . 



In this article, we consider central similarity PCDs for one dimensional data. We derive the asymptotic 
distribution of a graph invariant called relative (arc) density of central similarity PCDs. Relative density 
is the ratio of number of arcs in a given digraph with n vertices to the total number of arcs possible (i.e., 
to the number of arcs in a complete symmetric digraph of order n). We prove that, properly scaled, the 
relative density of the central similarity PCDs is a {/-statistic, which yields the asymptotic normality by 
the general central limit theory of [/-statistics. Furthermore, we derive the explicit form of the asymptotic 
normal distribution of the relative density of the PCDs for uniform one dimensional X points whose support 
being partitioned by class y points. We consider the entire ranges of the expansion and centrality parameters 
and the asymptotic distribution is derived as a function of these parameters based on detailed calculations. 
The relative density of central similarity PCDs is first investigated for uniform data in one interval (in M) 
and the analysis is generalized to uniform data in multiple intervals. These results can be used in applying 
the relative density for testing spatial interaction between classes of one dimensional data. Moreover, the 
behavior of the relative density in the one dimensional case forms the foundation of our investigation and 
extension of the topic in higher dimensions. 

We define the proximity catch digraphs and describe the central similarity PCDs in Section [2l define 
their relative density and provide preliminary results in Section [3l provide the distribution of the relative 
density for uniform data in one interval in Section |4] and in multiple intervals in Section [Sj provide extension 
to higher dimensions in Section |6] and provide discussion and conclusions in Section [71 Shorter proofs are 
given in the main body of the article; while longer proofs are deferred to the Appendix Sections. 



2 Vert ex- Random Proximity Catch Digraphs 



We first define vertex-random PCDs in a general setting. Let (i7,A^) be a measurable space and Xn — 
{Xi, . . . , Xn} and ym = {^1, ^2, • • ■ , Ym} bc two sets of fl- valued random variables from classes X and 
3^, respectively, with joint probability distribution Fxy and marginals Fx and Fy , respectively. A PCD is 
comprised by a set V of vertices and a set A of arcs. For example, in the two class case, with classes X and y, 
we choose the X points to be the vertices and put an arc from Xi e Xn to Xj G Xn, based on a binary relation 
which measures the relative allocation of Xi and Xj with respect to y points. Notice that the randomness 
is only on the vertices, hence the name vertex-random PCDs. Consider the map N : ^ ^(^): where V{fl) 
represents the power set of fl. Then given y,n C il, the proximity map N{-) associates with each point .t G 
a proximity region N{x) C fi. For B C ft, the Fi-region is the image of the map ri(-. A'') : V{il) Vi^l) 
that associates the region ri(i?, A^) := {z £ il : B C N{z)} with the set B. For a point x G 57, we denote 
Fidx}, A^) as Fi(a;, A^). Notice that while the proximity region is defined for one point, a Fi-region is defined 
for a point or set of points. The vertex-random PCD has the vertex set V = Xn and arc set A defined by 
{Xi,Xj) e Ait Xj e N{Xi). Let arc probability be defined as Pa{i,j) ■= P{{Xi,Xj) G A) for all i ^ j, 
i, j = 1,2, . . . ,n. Given y„i = {yi, 7/2, • ■ • , Vm}, let Xn be a random sample from Fx- Then N{Xi) are also 
iid and the same holds for Fi(Xi, A^). Hence Pa{i,j) = Pa for all i 7^ j, i, j = 1,2, . . . ,n for such Xn- 
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2.1 Central Similarity PCDs for One Dimensional Data 



In the special case of central similarity PCDs for one dimensional data, we have f2 = R. Let Y(i) be the i*'' 
order statistic of for i = 1, 2, . . . , m. Assume Y(j) values arc distinct (which happens with probability 
one for continuous distributions). Then Y^j) values partition M into (m + 1) intervals. Let 



oo =: r(o) <Yiu < ■■■ < Yi„,,) < Yi 



(m) 



'{m+l} 



OO. 



We call intervals (— oo,y(i)) and (Y(m)jOo) the end intervals, and intervals (Y(i_i) , Y(i)) for i = 2,...,m 
the middle intervals. Then we define the central similarity PCD with the parameter r > for two one 
dimensional data sets, and y,n, from classes X and 3^, respectively, as follows. For x G (Y(i_i), with 
« G {2, . . . , to} (i.e., for a; in a middle interval) and Mc € (y(i_i), Y(i)) such that c x 100 % of (y(j) — ^(i-i)) 
is to the left of (i.e., = Y(i_i) + c (Y(i) - Y(j-i))) 



7V(a;,r,c) 



))n(y(.-i),%) ifxe 



(1) 



(l-c)(:r-y(._i)) 



Observe that with r e (0, 1), wc have 



Nix, T, C) = 



if .T e (Y(,_i),M,), 

if X G (Mc,r(,)), 



(2) 



and with r > 1, we have 



Nix,T, c) 



Y, 



cr (y(i)-3:) 



(l-c)(3;-y(._i)) 



if x e (Vfj-i), 
if X e 



cy(,)+T(i-e)y(,_i) 

c+r(l-c) 



cy(,)+T(i-c)y(,_i) (i-c)y(._i)+cry(,) 

c+r (1 — c) ' 1 — c+c r 



(3) 



Y, 



(i) 



For an illustration of N{x,t,c) in the middle interval case, see Figure [T] (left) where = {2/1,2/2} with 
2/1=0 and 2/2 = 1 (hence Mc = c). 

Additionally, for x £ (Y(i_i), ^^(j;)) with i G {1, m + l} (i.e., for x in an end interval), the central similarity 
proximity region only has an expansion parameter, but not a centrality parameter. Hence we let Ne{x,T) 
be the central similarity proximity region for an x in an end interval. Then with r g (0, 1), we have 



Ne{x,T) 

and with r > 1, we have 

N,{x,t) 



^Ux-T (Y(i) -x) ,x + T (Y(i) - x)) if X < Y(i), 

1 (x - T (x - Y(^rn)) ,X + T (x - Y(^„i))) if X > Y(^rn) 



(x- r (r(i) - x) ,y(i)) ifx<Y(i), 
(Y(,„), X + r (x - >"(,„))) if X > Y(,„). 



(4) 



(5) 



If .X G 3^„i, then we define A^(x, r, c) = {x} and Np{x, r) = {x} for all r > 0, and if x = Mc, then in Equation 
([Ij, we arbitrarily assign N{x,t,c) to be one of 



(x - r (x - y(,_i)) , X + ll^ZfK^Zl:^ j p (l(.-i), %) 
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cT(y(,)-x) 

l-c j-^-rT 



For X from a continuous distribution, these special 



y(,)-x)jn(y'(.-i),%) 

cases in the construction of central similarity proximity region — X & 3^,„ and X = Mc — happen with 
probability zero. Notice that r > implies x £ N{x, r, c) for all x £ [Y(i_i), Y(i)] with i G {2, . . . , m} and 
X £ Nf.{x,T) for all x £ \X(i-i)iY{i)\ with i £ {l,m + 1}. Furthermore, limT-_>.oo 7V(a;, t, c) = (^(i-i), ^(i)) 
(and limT-_^oo ^e(2;, ''') = (^(i-i), ^(i))) for all x £ (Y(i_i) , y(i)) with i e {2, . . . , m} (and i G {1, m + 1}), so 
we define N{x,oo,c) = (y(i_i),y(i)) (and Nf.{x,(X)) = (y(i_i),y(,j))) for aU such a;. 



?/i = 



(1 — c) r d/c 



y2 ^ 



'2x 



c = 1/2 



?/2=l 



?;i = 



-I 



t{\-x 
ct{\-x)I[\-c) 



y2 = i 



2/1 = 



c=l/2 2; 



2/2=1 



Figure 1: Plotted in the left is an illustration of the construction of central similarity proximity region, 
N{x,T,c) with T £ (0, 1), 3^2 = {2/1,2/2} with yi = and ^2 = 1 (hence Mc = c) and x £ (0, c) (top) and 
X £ (c, 1) (bottom); and in the right is the proximity region associated with CCCD, i.e., N{x, t = 1, c = 1/2) 
for an x £ (0, 1/2) (top) and x £ (1/2, 1) (bottom). 
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The vertex-random central similarity PCD has the vertex set Xn and arc set A defined by {Xi,Xj) £ 



Xj £ N{Xi,T,c) for Xi,Xj in the middle intervals and {Xi,Xj) £ A 

»( t,c). a 



^ X, £ N,{X,,t) 

for Xi^Xj in the end intervals. We denote such digraphs as S!n.m( T,c). A i^n,m('7", c)-digra p h is a pseudo 
digraph according to some authors, if loops are allowed (see, e.g.. IChartrand and Lc sniak' (1996^)). The 
^n,m{T, c)-digraphs are closely related to the prox imity graphs of lJaromczvk and Toussaint (1992,) and might 
be considered as a special case of co vering sets of | Tuza| Our vertex-random proximity digraph is not 

a standard random graph (see, e.g., Janson et al.l (j20C)ol )). The randomness of a ^„_m(''', c)-digraph lies in 



the fact that the vertices are random with the joint distribution Fx.y, but arcs {Xi,Xj) are deterministic 
functions of the random variable Xj and the random set N{Xi, r, c) in the middle intervals and the random 
set Ne(Xi, T) in the e nd int ervals . In R, the ver tex-random PCD is a special case of interval catch digraphs 
(see, e.g., ISen et all (198^ and IPrisneil ( 1994 )). Furthermore, when r = 1 and c = 1/2 (i.e., Mc = 
(Y(i_i) -I- Y(i)) /2) we have N{x, 1, 1/2) = B{x,r{x)) for an x in a middle interval and Ne{x, 1) = B{x,r{x)) 
for an x in an end interval w here r{x) = d{x^ym) = niinj,gj;^^ d{x,y) and the corresponding PCD is the 
CCCD of IPriebe et aU (I2OOII) . See also Figure ffl (right) . 



3 Relative Density of Vert ex- Random PCDs 



Let Dn = (V, A) be a digraph with vertex set V = {vi,V2, . . . , Vn} and arc set A and let | • | stand for the set 
cardinality fu nction. The relative density of the digraph _D„ which is of order |V| = n > 2, denoted p{Dn), 
is defined as ( Janson et al. ( 200Cl )) 

71(71 — Ij 

Thus p{Dn) represents the ratio of the number of arcs in the digraph £)„ to the number of arcs in the 
complete symmetric digraph of order n, which is n{n — 1). For rt < 1, we set p{Dn) ~ 0, since there are no 
arcs. If Dn is a random digraph in which arcs result from a random process, then the arc probability between 
vertices Vi, Vj is Pa{i,j) = P{{vi,Vj) £ A) for all i j , i, j = 1,2, . . . , n. 
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Given ~ {yi, 1/2, • ■ • , y™}, let Xn be a random sample from Fx and Z?„ be the PCD based on 
proximity region N{-) with vertices Xn and the arc set A is defined as {Xi,Xj) G A ii Xj S N{Xi). Let 
^ij •= {dij ~^ 9ji)l'^ where gij = I{{Xi,Xj) G A) + l{Xj G N{Xi)). Then we can rewrite the relative density 
as follows: 

P{Dn) = —, -T V V h^■i. 

n[n — 1) ^-^ ^-^ 



Ahhough the digraph is asymmetric, /ly is defined as t he average numb er of arcs between Xi and Xj in 
order to produce a symmetric kernel with finite variance ( Lehmaiml ( 1988f )). The relative density p{Dn) is a 
random variable that depends on n, F, and N{-) (i.e., 3^m)- But E [p(D„)] = E [/112] = Pa only depends on 
F and N{-). Furthermore, 



< Var [piD,,)] 



•n? (n — 1) 



-Var 



2 4(r> — 2) 

— -Var [hi2] + —, ^ Gov [/112, /iia] < 1/4. 

n (n — 1) n[n — 1) 



(6) 

Hence p(Z?„), is a one-sample {/-statistic of degree 2 and is an unbiased estimator of arc probability pa- 
If, additionally, ly = Cov [hij,hik] > for all i 7^ j 7^ k, i,j,k e {1, 2, . . . , ri}, then a CLT for [/-statistics 

( Lehman"nl (jl988l )) yields ^/n [p{Dn) — Pa] — ~> A/'(0, 4 z^) as n — > 00, where stands for convergence in law 
and A/'(/x, ) stands for the normal distribution with mean \x and variance . 

In Equation (|6]), we have 

Var[/i,j] = Var [/112] - E[(/i,,f ] - (E[/i,,])2 = E[(.g,, + g,,)V4] - = 

(E[5,,] -I- 2 E[5,,]E[g,,] + E[g,,])/4 - = (^^ + 2p, + p,)/4 - ^ (p, - p2)/2 = (1 - p„) /2 

and the covariance is 

Gov [/ii2, /113] E \h]_ihxz\ - E [/112] E [/iig] = E [/112/113] - p\-, 



with 



4E[/ii2/ii3] = E[(c/i2 -l-52i)(5i3 +531)] = £[312313 +312.931 +521.913 + .921531] 

= E[I(X2 G iV(Xi)I(X3 e iV(Xi)) + I(X2 G iV(Xi)I(Xi G Ar(X3)) + 

I(Xi G iV(X2)I(X3 G iV(Xi))] +I(Xi G A^(X2)I(Xi G A^(X3))] 
= E[I({X2,X3} C iV(Xi)) +1(^2 G iV(Xi)I(X3 G ri(X3, A^)) + 

1(^2 G ri(Xi)I(X3 G 7V(Xi))] +1(^2 G ri(Xi,Af)I(X3 G ri(Xi,Ar))] 

= p({X2,X3} c 7V(Xi)) + 2P(X2 G ^(Xi),X3 G ri(Xi,iv)) + p({X2,X3} c ri(Xi,iV)). 

Then v = Cov{h,j,h,k) = E[h,jh,k] - E[%]E[/ijfc] = E,[h,jh,k] -pl = £[^,12/113] - > iff 
P({X2,X3} C N{Xi)) + 2P{X2 G A(Xi),X3 G ri(Xi, A^)) + P({X2,X3} C ri(Ai, A^)) > 4^^. 



Notice also that 

E[|%|3] - E[(5,, + 5,,)V8] = E[s3. + 33f,5j. + Sg^.-gf^ + ^l/g = E[5,, +35.^5^. + 3g,,5,, + 5rd/8 = 

(2E[g,,] + 6E[g,,]E[g,,])/8 = (p^ + ipl)/^ < 00. 

Then for 1/ > 0, the sharpest rate of convergence in the asymptotic normality of p{Dn) is 



sup 



P 



n) Pa) 



< t 



$(0 



< 8 Kpa (4 A' 



Pa 



(7) 
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where K is a constant and is the distribution function for the standard normal distribution (jCallaert and Janssen 
(1973)). 



In general a random digraph, just like a random graph, can be obtained by starting with a set of n vertices 
and adding arcs between them at random. We can consider the digraph counterpart of the Erdos-Renyi 
m odel for random graphs, denoted D{n,p), in which every possible arc occurs independently with probability 
p ( Erdos and Renvi ( 1959I )). Notice that for the random digraph D{n,p), the relative density of D{n,p) is a 



[/-statistic; however, the asymptotic distribution of its relative density is degenerate (with p{D{n,p)) — > p, 
as n — >■ oo) since the covariance term is zero due to the independence between the arcs. 

Let J^(M) := {Fx,y on M with P{X = Y) = and the marginals. Fx and Fy, are non-atomic}. In this 
article, we consider &n,miT, c)-digraphs for which X„ and are random samples from Fx and Fy, respec- 
tively, and the joint distribution of X, Y is Fx.y G J^(M). Then the order statistics of and 3^„i are distinct 
with probability one. We call such digraphs as J- (M.) -random Sfn.miT, c)-digraphs and focus on the random 
variable p{^n,miT, c)). For notational brevity, we use Pn,m{T,c) instead of p(^„^m(T, c)). It is trivial to see 
that < Pn,m{T,c) < 1, and Pn,m(j,c) > for nontrivial digraphs. 



3.1 The Distribution of the Relative Density of J^(]R)-random ^n,m{T^ c)-digraphs 

Let I,; := (y(,_i), Y(j)), := X„nl,, and 3^[j] := {Y(,_i), Y(,)} for i = 1, 2, . . . , (to -1-1). Let -D[,](r,c) be the 
component of the random !^n,m{T, c)-digraph induced by the pair and 3^[j]. Then we have a disconnected 
digraph with subdigraphs Z?!,] (r, c) for i = 1, 2, . . . , (m-l- 1) each of which might be null or itself disconnected. 
Let A[i] be the arc set of (r, c), and p^.^ (r, c) denote the relative density of D[i] (r, c); rii := \X[^ | , and Fi 
be the density Fx restricted to I,; for i G {1, 2, . . . , to -|- 1}. Furthermore, let Af]*' G li be the point so that 
it divides the interval li in ratios c and 1 — c (i.e., length of the subinterval to the left of m]*' is c x 100 
% of the length of Xi) for i € {2, . . . , m}. Notice that for i £ {2, . . . , m} (i.e., middle intervals), Dj^j (r, c) is 
based on the proximity region N{x, r, c) and for i G {1, m + 1} (i.e., end intervals), I?[i](r, c) is based on the 
proximity region Ni,{x^t). Since we have at most m -f- 1 subdigraphs that are disconnected, it follows that 
we have at most := X^Si^ 'n.i{ni — 1) arcs in the digraph !^n.m{T, c). Then we define the relative density 
for the entire digraph as 

PnMr, c) := ^ = = — E (^»(^» - 1))-°^ (8) 

TZ rT-\ flj rj-' rT-\ 

^ ^ ^ 2—1 

m+1 ^ / _ -^s 

Since > for each i and — — = 1, it follows that pn.m{T, c) is a mixture of the p^.^ (r, c). 

We study the simpler random variable Pj.j (r, c) first. In the remaining of this section, the almost sure (a.s.) 
results follow from the fact that the marginal distributions Fx and Fy are non-atomic. 

Lemma 3.1. Let £'[i](T, c) be the digraph induced by X points in the end intervals (i.e., i G {1, (m + 1)}) 
and Pj.j(t, c) be the corresponding relative density. For t > 0, if Ui < 1, then p^..^{t,c) = 0. For t > 1, if 
Hi > 1, then Pj.j(r, c) > 1/2 a.s. 

Proof: Let i = m + 1 (i.e., consider the right end interval). For all r > 0, if Um+i < 1, then by definition 
Pim+i] (''"' ~ 0- assume Um+i > 1- Let A'jjn,!.!] = {Zi, Z2, . . . , -^rim+i} and be the corresponding 

order statistics. Then for r > 1, there is an arc from to each for k < j, with j, k G {1, 2, . . . , rim+i} 
(and possibly to some other Zi), since (^(j) , t) = (^(m), ^(j) -I-t (^(j) — ^(m))) and so Zj-j.) G A^e {^(j)'''') ■ 
So, there are at least -I- 1 -I- 2 -t- ... -t- n„i+i — 1 = "■m+i("'?ri-i-i ~ l)/2 arcs in D[„j_|_i] (r, c). Then p^.j (r, c) > 
('^Tn+i("-m+i ^ l)/2)/('T.m+i('^m+i ^ 1)) = 1/2. By Symmetry, the same results hold for i ~ 1. U 

Using Lemma [3. H we obtain the following lower bound for pn,m(j, c) for r > 1. 
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Theorem 3.2. Let Dn^m{T,c) he an T{M.)-random &n.m{T,c)- digraph with n > 0, to > and ki and k2 be 
two natural numbers defined as ki := X]"=2("'i:i(^i-i ^ l)/2 + J^j, 2(^^1.2 ~ l)/2) and k2 := m+i} '^il*^'' ~ 



l)/2, where := n mI*') 



and Hi 2 



A'„ n Y(,)^ . T/ien /or t > 1, we have (fci + 



Proof: For i £ {1, (to + 1)}, wc have k2 as in Lemma l3.ll Let i G {2, 3,..., to} and Xi^i n 
(r(,_i),MW) = {C/i,[/2,...,t/„.,J, and X,^2 := n (mW,%) = {Fi, F2, . . . , J- Furthermore, let 
C/(j) and V(j;) be the corresponding order statistics. For r > 1, there is an arc from C/q) to U(^]^j for fc < j, 
J, € {1, 2, . . . , rii^i} and possibly to some other Ui, and similarly there is an arc from V(^jj to V(fe) for fc > j, 
J, fc g {1,2,..., ni^2} and possibly to some other Vi. Thus there are at least "'■i^"^'-!^^) -|_ "».2("^.,2-i) ^^^.^g 

(r,c). Hence p„,m(T, c) > (fci + fc2)/n3,. ■ 
Theorem 3.3. For i = 1, 2, 3, . . . , to + 1, t = 00, and Hi > 0, liie have p^.^ (r = 00, c) = I(r7,i > 1) and 

Pn,m{T = 00,c) = 1 fl.S. 



Proof: For r = cxi, if Ui < 1, then — 00, c) = 0. So we assume Ui > 1 and let i = m + \. Then 

Ne{x,co) = (Y(„i),oo) for all x € 00). Hence i^im+i] (00, c) is a complete symmetric digraph of order 

rim+i, which implies (t = 00, c) = 1. By symmetry, the same holds for i = 1. For i g {2, 3, . . . , to} and 

Hi > 1, we have N{x,oo,c) — Ti for all x G 1^, hence iI'[i](oo,c) is a complete symmetric digraph of order 

^li — l)Pr 1 (C0,C) 

Ui, which implies pj.j(oo,c) — 1. Then p„.m(oo,c) = = 1; since when ni < 1, Ui has no 

contribution to n^, and when > 1, we have Pj., (oo,c) = 1. ■ 



4 The Distribution of the Relative Density of Central Similarity 
PCDs for Uniform Data 

Let —00 < 5i < 62 < 00, ym be a random sample from non-atomic Fy with support S{Fy) C (61,62), and 
Xn = {Xi, X2, . ■ ■ , Xn} be a random sample from Fx = U{6i,62), the uniform distribution on {61, 62)- So we 
have Fx.Y e T{R). Assuming we have the realization of 3^™ as ym = {yi, 2/2, . • . , Um} = {y(i), 2/(2), • ■ • , y(m)} 
with 61 < 2/(1) < 2/(2) < . . . < 2/(m) < <52, we let 2/(0) •= ^1 and 2/(m+i) ■= (52. Then it follows that the 
distribution of Xi restricted to li is Fx\xi =U{Ii). We call such digraphs as U {61, 62) -random ^n,m{T,c)- 
digraphs and provide the distribution of their relative density for the whole range of r and c. We first present 
a "scale invariance" result for central similarity PCDs. This invariance property will simplify the notation 
in our subsequent analysis by allowing us to consider the special case of the unit interval (0, 1). 

Theorem 4.1. (Scale Invariance Property) Suppose Xn is a set of iid random variables from U{6i, 62) where 
61 < 62 and y^n is set of to distinct y points in {61, 62)- Then for any t > 0, the distribution of Pj.j (r, c) is 
independent of y^i] ( and hence of the restricted support interval Ti) for all i e {1, 2, . . . , to + 1}. 

Proof: Let 61 < 62 and y^, be as in the hypothesis. Any U{6i, 62) random variable can be transformed into 
a U{0, 1) random variable by (/>(a;) = (x — 6i)/{62 — 61), which maps intervals (^1,^2) Q (61,62) to intervals 
(0(<i),0(i2)) C (0,1). That is, if X - ^^(^1,^2), then we have 0(X) ~ W(0, 1) and P{X e (^1,^2)) = 
P{(j){X) G {(j){ti), 4>{t2)) for all (^1,^2) ^ {61,62)- The distribution of p^.j (r, c) is obtained by calculating such 
probabilities. So, without loss of generality, we can assume X\^i^ is a set of iid random variables from the 
ti{Q, 1) distribution. That is, the distribution of Pj^j (r, c) does not depend on 3^[j] and hence does not depend 
on the restricted support interval Ti. ■ 

Note that scale invariance of Pj;] (t = oo,c) follows trivially for all Xn from any Fx with support in 
(61,62) with 61 < 62, since for r = 00, we have p,., (t = 00, c) = 1 a.s. for non-atomic Fx- 
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Based on Theorem 14. 1[ we may assume eaeh li as the unit interval (0,1) for uniform data. Then the 
central similarity proximity region for x S (0, 1) with parameters c S (0, 1) and r > have the following 
forms. If X G Xi for i G {2, . . . , m} (i.e., in the middle intervals), when transformed under </>(•) to (0, 1), we 
have 

^, ^ .^[{x{l~r),x{c+{l~c)T)/c)f^{Q,l) ifxG(0,c), 

|(a;-cr(l-a;)/(l-c),x + (l-a;)T)n(0,l) ifxG(c,l). ^' 

In particular, for r G (0, 1), we have 

TVfx r 0) = /^'^^^"^^'''^'^+^^"'^^^^/^^ ifxG(0,c), 

^''^ \{x-ct{1-x)/{1-c),x + {1-x)t) ifxG(c,l) ^ ' 

and for r > 1, we have 

•(0,x(c+(l-c)r)/c) ifxG (0,^^p^), 

^(^>^.c)=<j(0,l) if^e (11) 

(a,_cr(l-x)/(l-c),l) ifxG (y^f^,l). 

and N{x = c, r, c) is arbitrarily taken to be one of {x (1 — t), x (c+ (1 — c) T)/c)n (0, 1) or (x — cr (1 — .t)/(1 — 
c), X + (1 — x) r) n (0, 1). This special case of "X = c" happens with probability zero for uniform X . 

If X G Xi (i.e., in the left end interval), when transformed under <}>[■) to (0,1), we have Ne{x,T) = 
(max(0, X — T (1 — x)), min(l, x + t (1 — x)); and if x G Im+i (i.e., in the right end interval), when transformed 
under (/>{■) to (0, 1), we have Ne{x, r) ~ (max(0, x (1 — r)), min(l, x (1 + r))). 

Notice that each subdigraph (r, c) is itself a Z//(Ii)-random ^„,2(t, c)-digraph. The distribution of the 
relative density of D^jj (r, c) is given in the following result. 

Theorem 4.2. Let p^.^{t,c) be the relative density of subdigraph £'[i](r, c) of the central similarity PCD 
based on uniform data in (Si, 62) where 61 < S2 and 3^,„ be a set of m distinct y points in (Si, 62). Then for 
T G (0,cx)), as Ui — > 00, we have 

(i) for i G {2, . . . , m\ , ^fnl [p^.j (r, c) — /^(t, c)] -'^ A/'(0, 4 v(t^ c)), where /i(T, c) = E [pj.j (r, c)] is the arc 
probability and v{t^c) = Cov[/ii2, /112] in the middle intervals, and 

(ii) for i G {l,m + 1}, ^/nl [p|.j(t, c) — /ie(''')] ~> A/'(0, 4 z^e(T)), where ^ie(j) = E [P[i](T, c)] is the arc 
probability and I'dT) = Cov[/ii2, /112] JJ^ i/ie end intervals. 



Proof: (i) Let i G {2, . . . , m} (i.e., Ii be a middle interval). By the scale invariance for uniform data (see 
Theorem 14. ip . a middle interval can be assumed to be the unit interval (0, 1). The mean of the asymptotic 
distribution of p^.j (r, c) is computed as follows. 

E[p,,j (r, c)] = E[/ii2] = P(X2 G iV(Xi, T, c)) = /^(r, c) 

which is the arc probability. And the asymptotic variance of P|;j(t, c) is Cov[/ii2, /113] = 4zy(T, c). For 
T G (0,00), since 2/112 = 1{X2 G iV(Xi,T, c)) + I{Xi G iV(X2,T, c)) is the number of arcs between Xi and 
X2 in the PCD, /112 tends to be high if the proximity region N{Xi,t, c) is large. In such a case, /113 tends to 
be high also. That is, /112 and /113 tend to be high and low together. So, for r G (0, 00), we have v{t, c) > 0. 
Hence asymptotic normality follows. 

(ii) In an end interval, the mean of the asymptotic distribution of Pj.j (r, c) is 
E[pj^,(r,c)] = E[hi2] = P{X2 G N,{Xi,t)) = fi^ir) 
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the asymptotic variance of Pj.j(t, c) is Cov[ft,i2, /113] ~ 4i/e(T). For r G (0, cxj), as in (i), we have VeiT) > 0. 
Hence asymptotic normahty follows. ■ 

Let P2N := P{{X2,X3} C N{Xi,T,c)), Png := e iV(Xi, r, c), X3 G ri(Xi, r,c)), and P2G 

P({X2,X3} cri(Xi, T,c)). Then 

Cov[/li2, /113] - E[/li2/ll3] - E[/li2]E[/li3] = E[/li2/ll3] " = (P2Ar + 2 P^G + ^'2g)/4 - c)^ 



4E[/ii2M - P({X2,X3} C N{Xi,T,c)) + 2P{X2 e iV(Xi,T,c),X3 G ri(Xi, r,c))+ 

P{{X2,X3} C ri(Xi, T,C)) = P2JV + 2PNG + P2G- 

Similarly, let P2Ar,e := P{{X2,X3} C A^e(^i,T)), P^Ce := P(^2 e A^e(Xi,T),X3 e ri,e(Xi,T)), and 
P2G,e := P({X2,X3} C ri,e(Xi,T)). Then 

Cov[/li2,M = (P2W,e+2PwG.e+/'2G,e)/4-/ie(T)^ 

For T = oo, we have N(x, oo, c) = li for all .t G with i G {2, . . . , m} and Ne{x, oo) = for all x G 
with i G {1, m + 1}. Then for i G {2, . . . , m} 

E [pj^, (c30, c)] = E [/112] = c) = P(X2 G iV(Xi, 00, c) = P(X2 G I^) = 1. 

On the other hand, 4 E [/ii2/ii3] = P{{X2,X3} C A^(Xi, 00, c))+2 P(X2 G iV(Xi, 00, c), X3 G ri(Xi, 00, c)) + 
P({X2,X3} C ri(Xi,(X),c)) = (1 + 2 + 1). Hence E[/ii2/ii3] = 1 and so i^{oo,c) = 0. Similarly, for 
i G {l,m + 1}, we have /Xe(cc) = 1 and :^e(oo) = 0. Therefore, the CLT result does not hold for r = 00. 
Furthermore, P^^It = 00, c) = 1 a.s. 

By Theorem we have z^(r, c) > (and i^eir) > 0) iff P2Ar + 2Pjvg + ^2G > 4^(t,c)2 (and P2Ar,e + 

2PwG,e + /'2G,e >4Me(T)2). 

Remark 4.3. The Joint Distribution of (/ii2,ft.i3): The pair (/ii2,/ii3) is a bivariate discrete random 
variable with nine possible values such that 

(2 /112, 2 hu) G {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)}. 

Then finding the joint distribution of {hn, /113) is equivalent to finding the joint probability mass function 
of {hi2, his). Hence the joint distribution of {hi2, his) can be found by calculating the probabilities such as 
P{{hi2,hi3) = (0, 0)) = P({X2, X3} C li \ (iV(Xi, T, c) U ri(Xi, T, c))). □ 



4.1 The Distribution of Relative Density of i/2)-random ^„ 2(7', c)-digraphs 

In the special case of to 2 with 3^2 = {j/i, 2/2} and Si = yi < 1/2 ~ S2, we have only one middle interval and 
the two end intervals are empty. In this section, we consider the relative density of central similarity PCD 
based on uniform data in (yi,y2)- By Theorems 14.11 and 14.21 the asymptotic distribution of any p^.^(t,c) 
for the middle intervals for m > 2 will be identical to the asymptotic distribution of Z^(j/i, 2/2)-i'andom 
^n,2{T, c)-digraph. 

First we consider the simplest case of r = 1 and c = 1/2. By Theorem 14. 11 without loss of generality, we 
can assume (yi, j/2) to be the unit interval (0, 1). Then N{x, 1, 1/2) = B(x, r{x)) where r{x) = min(a;, 1 — x 



for X G (0, 1). Hence central similarity PCD based on N{x, 1, 1/2) is equivalent to the CCCD of lPriebe et al 
(l200lh . Moreover, we have ri(Xi,2, 1/2) = (Xi/2, (1 +Xi) /2). 
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Theorem 4.4. Asn-^oo, we have ^ [p„(l, 1/2) - ^(1, 1/2)] ^ 7V(0,4!^(1, 1/2)), where 1/2) = 1/2 
and 4iy{l, 1/2) = 1/12. 

Proof: By symmetry, we only consider Xi G (0, 1/2). Notice that for x £ (0, 1/2), we have N{x, 1, 1/2) = 
(0,2a;) and ri(a;,l,l/2) = (a;/2, (1 + a;)/2). Hence At(l,l/2) = P{X2 G iV(Xi, 1, 1/2)) = 2P{X2 G 
N{Xi, 1, 1/2), Xi G (0, 1/2)) by symmetry. Here 

PiX2 G 7V(Xi, 1, 1/2), G (0, 1/2)) = PiX2 G (0, 2xi), Xi G (0, 1/2)) 

rl/2 r2xi M/2 r2xi nl/2 

= / fi,2ixi,X2)dx2dxi — / / ldx2dxi = / 2xidxi = xI\q =1/4. 

Jo Jo Jo Jo Jo 

Then ^(1, 1/2) = 2 (1/4) = 1/2. 

For Cov(/ii2, /113), we need to calculate P2N, Png, and P2G- The probability 

P2N = P({X2,X3} C N{Xi, 1, 1/2)) = 2Pi{X2,X3} C Ar(Xi, 1, 1/2), Xi G (0, 1/2)) 
and P{{X2, X3} C N{Xi, 1, 1/2), Xi G (0, 1/2)) = J^^\2xi)^dxi = 1/6. So P2W = 2 (1/6) = 1/3. 

PjVG = 2 P(X2 G Ar(Xi, 1, 1/2), G ri(Xi, 1, 1/2), G (0, 1/2)) and 

/•1/2 

PiX2 G Ar(Xi, 1, 1/2), X3 G ri(Xi, 1, 1/2), Xi G (0, 1/2)) = / (2xi)(l/2)dxi = 1/8. 

Jo 

Then Png = 2(1/8) = 1/4. 

Finally, we have P2G = 2 P({X2,X3} cri(Xi, 1,1/2), Xi G (0, 1/2)) and P({X2, X3} C ri(Xi, 1, 1/2), Xi G 
(0, 1/2)) = /o'/'(l/4)dxi = 1/8. So P2G - 2 (1/8) = 1/4. 

Therefore 4E[/ii2/ii3] = 1/3 + 2 (1/4) + 1/4 = 13/12. Hence 4i^(l,l/2) = 4 Cov[/ii2, /113] = 13/12- 

4(1/2)2 ^ 1/^2. m 

The sharpest rate of convergence in Theorem 14. 41 is K ^(2,1/2) _ j^2-v/3 

^ ' ' ^ni/(24/2)3 7" 

Next we consider the more general case of r = 1 and c G (0, 1). For x G (0, 1), the proximity region has 
the following form: 

ATf 1 ^ 1(0' ^/C) ifxG(0,c), 

^^"'''^^n((-c)/(i-c),i) if.G(c,i), 



and the Fi-region is ri(x, 1, c) = {cx, {1 — c) x + c). 

Theorem 4.5. yls n — > 00, /or c G ( 

/x(l,c) = 1/2 andAiy(l,c) =c(l-c)/3. 



Theorem 4.5. As n ^ 00, for c G (0,1), we have [p„.2(l, c) — /i(l, c)] — > N{0,Av{l,c)), where 



Proof is provided in Appendix 1. See Figure [2] for 4i/(l,c) with c G (0,1/2). Notice that /^(l,c) is 
constant (i.e., independent of c) and i'(l,c) is symmetric around c = 1/2 with v{l^c) ~ v{l,\ — c). Notice 
also that for c = 1/2, we have ^i{l,c = 1/2) = 1/2, and 4z/(l,c = 1/2) = 1/12, hence as c 1/2, the 
distribution of p„,2(l, c) converges to the one in Theorem 14. 41 Furthermore, the sharpest rate of convergence 
in Theorem 14.51 is 

^ /i(l,c) ^ 3^3 K 
^nv{\,cf 2 (1 - c)3 ^A^ 
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Figure 2: The plot of tlie asymptotic variance 4 c) as a function of c for c €E (0, 1). 



and is minimized at c = 1/2 (which can easily be verified). 

Next we consider the case of r > and c = 1/2. By symmetry, we only consider Xi G (0, 1/2). For 
X G (0, 1/2), the proximity region for r G (0, 1) is 



iV(x,T,l/2) 



(x(1-t),.t(1 + t)) if xG (0,1/2), 

(x - (1 - x) T, X + (1 - x) r) if X G (1/2, 1), 



and for r > 1 



(0,x(l + r)) if XG (0,1/(1 + r)), 

N{x, r, 1/2) = <; (0, 1) if X G (1/(1 + r), r/(l + r)), 

(x-(1-x)t,1) if xG (r/(l + T),l). 



(14) 



(15) 



And the Fi-region for r G (0, 1) is 



' (x/(l + r), x/(l - r)) if X G (0, (1 - t)/2), 

Fi(x, T, 1/2) = <( (x/(l + r), (x + r)/(l + r)) if x G ((1 - r)/2, (1 + r)/2), 

((x - r)/(l - r), (x + r)/(l + r)) if x G ((1 + r)/2, 1), 



(16) 



and for r > 1, we have Fi(x, r, 1/2) = (x/(l + t), (x + t)/(1 + t)). 

Theorem 4.6. for t G (0, oo), we have ^/n [pn,2iT, 1/2) — /i(T, 1/2)] — > A^(0, 4 ^{t, 1/2)) as n ^ oo, where 

\/2 ifO<T<l, 



t/(t + 1) i/T>l, 



and 



4Kr,l/2) = <' 2.^^ 

I 3(r+l)^ 



i/0 < T < 1, 

i/r > 1. 



(17) 



(18) 



Proof is provided in Appendix 1. See Figure [3] for the plots of /i(r, 1/2) and 4i^(T, 1/2). Notice that 
lim^_j.oo i^(T, 1/2) = 0, so the CLT result fails for t = oo. Furthermore, lim^_^o '^('''i 1/2) = 0. For t — 1, 
we have /i(r — l,c = 1/2) = 1/2, and 4z^(r ~ l,c = 1/2) = 1/12; hence as r — > 1, the distribution of 
Pn.2(T, 1/2) converges to the one in Theorem l4.4l Furthermore, the sharpest rate of convergence in Theorem 
13] is 

-3/2 



K ■ 



. 97 r ( (6 r+3 — 3 T^ — 3 r^)? , . 

Mr,l/2) _K]^^r- (.+1)^ ' 1 ifO<r<l, 



(19) 



if T > 1. 
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Figure 3: The plots of the asymptotic mean (u(t, 1/2) (left) and the variance 4 i'{t, 1/2) (right) as a function 
of T for T e (0, 5]. 



and is minimized at r « .73 which is found by setting the first derivative of this rate with respect to r to 
zero and solving for r numerically. We also checked the plot of /i(T, 1/2)/^z/(t, 1/2)^ (not presented) and 
verified that this is where the global minimum is attained. 



Finally, we consider the most general case of r > and c G (0, 1/2). For r G (0, 1), the proximity region 



IS 



N{X,T, c) 



■(x(l-r),.T (l+ (i^)) if.Te(0,c), 
(x-^^^,:r + (l-x)r) if.Te(c,l), 



(20) 



and the Fi -region is 



Fi (x,T,c) ^ < 



c+(l — c) T ' 1 — r 



cx X (l-c)+cr 

c+(l — c) r ' 1 — c+cr 

x-T X (l-c)+CT 
1 — r ' 1 — c+c r 



if i-G (0,c(l-r)), 

if a: € (c (1 - r), c (1 - t) + r), 

if a; e (c(l - t) +t, 1). 



(21) 



For T > 1, the proximity region is 

'(0,x(l + ii^) 



if X e 



c+(l — c)r' 1 — c+CT 



and the Fi-region is 



7V(x,T,c) = I (0,1) 

X (1 — c) + cr^ 



(22) 



Fi(a;,T, c) 



ca; 



c+(l— c)r' 1 — c + cr 



(23) 



Theorem 4.7. For r e (0,oo), tiJC have ^/n [p„_2(''', c) — /i(r, c)] — !■ A/'(0, 4 j/(t, c)), as n oo, where 
h{t,c) = ^ll{T,c)I{0 < c < 1/2) + ^2(t,c) 1(1/2 < c < 1) and v{t,c) = ;^i(t,c)I(0 < c < 1/2) + 
z/2(t, c) 1(1/2 < c < 1). ForO <c< 1/2, 



I Z/0<T<1, 

2 {ct — c+1){t+c—ct) — ' 



(24) 
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Figure 4: The surface plots of the asymptotic mean ii{t,c) (left) and the variance Av{t,c) (right) as a 
function of t and c for t £ (0, 10] and c E (0, 1), respectively. 



and 



Avi{t,c) = \ (25) 

K2(t,c) 1/ T > 1, 



where 

^ t'^ (c'^T^ -3c^T^ -CT^ + 2c'^t + 3ct'^ -c^ -2cT-T^ + c + t) 
3 (cr — c + 1) (c + T — cr) ' 

and 

K2 (r, c) = [c(l - c) (2 c* - 7 c V - 4 c^^ r'^ + 8 c + Ud" + t"" - 2 ~ 16 - 7 c V - c r"^ - 
2c*r+4c^r^ + 12cV^+c''+4cV-6cV^-4cr^-2c^-3cV+4cr^+c^+cr-T^)]/|^3 (cr - c + 1)^ (cr-c-r)^j. 
^nrf for 1/2 < c < I, we have /i2(T, c) = ^i(t, 1 — c) anrf i^2(''', c) = i^i(T, 1 — c). 

Proof is provided in Appendix 1. See Figure U for the plots of ^(t, c) and 4i/(t, c). Notice that 
lim^_j.oo '^(t, c) = 0, so the CLT result fails for r ~ oo. Furthermore, lim^_j.o ^^(■'": c) = 0. For t = 1 
and c ~ 1/2, we have fi{T — l,c ^ 1/2) — 1/2, and Ai'{t = 1,c = 1/2) = 1/12, hence as r 1 and 
c — >■ 1/2, the distribution of p„,2(t, c) converges to the one in Theorem 14. 41 The sharpest rate of convergence 
in Theorem 14.71 is K (the explicit form not presented) and is minimized at r « 1.55 and c w 0.5 

which is found by setting the first order partial derivatives of this rate with respect to r and c to zero and 
solving for r and c numerically. We also checked the surface plot of this rate (not presented) and verified 
that this is where the global minimum is attained. 



4.2 The Case of End Intervals: Relative Density for or W ^2) 

Data 

Recall that with m > 1 for the end intervals, Ii = and Im+i = {lJ(m)T^2)i the proximity and 

Fi-regions were only dependent on x and r (but not on c). Due to scale invariance from Theorem 14. li we 
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can assume that each of the end intervals is (0, 1). Let Ti,e(x, r) be the Fi-region corresponding to Ni,{x, r) 
in the end interval case. 

First we consider r = 1 and uniform data in the end intervals. Then for x in the right end interval, 
Ne{x, 1) = (0, min(l, 2a;)) for x G (0, 1) and the Ti-region is ri,e(a;, 1) = {x/2, 1). 

Theorem 4.8. Let £'[j](l,c) be the subdigraph of the central similarity PCD based on uniform data in 
i^i, S2) where 5i < 62 and ym be a set of m distinct y points in {Si, 62). Then for i G {1, m + 1} (i.e., in 

the end intervals), as rii — >■ 00, we have ^Jnl [pj.j(l,c) — /Xe(l)] — ^ A/'(0, 4 z/e(l)), where /Xe(l) = 3/4 and 
4z/e(l) = 1/24. 



The Proof is provided in Appendix 1. The sharpest rate of convergence in Theorem l4.8l is K 



_tsfX) 



36^/6^ for i e {1,771+1}. 



Next we consider the more general case of r > for the end intervals. By Theorem 14.11 we can assume 
each end interval to be (0, 1). For r G (0, 1) and x in the right end interval, the proximity region is 

^^(^^^)^f(-(l-r),x(l + r)) if.e(0,l/(l + r)), 

^ ' \(x(l-r),l) ifxe(l/(l + r),l), ^ ' 

and the Fi -region is 

Fi,e(x,T) = ly \'' (27) 




For r > 1 and x in the right end interval, the proximity region is 

N(x.^ /(0,2;(l + r)) ifxe(0,l/(l + r)), 

^^^"'"^ = 1(0,1) if.e(i/(i + r),i), 

and the Fi-region is Fi.e(a;, t) = {x/{l + t), 1) . 

Theorem 4.9. Let D[i-^{T,c) be the .subdigraph of the central .similarity PCD based on uniform data in (61,62) 
where 61 < 62 and ym be a set of m distinct y points in (61,62). Then for i S {1, 777 + 1} (i.e., in the end 

intervals), and t e (0,oo), we have ^Jnl [pj.j(t, c) — /ie(T)] A/'(0, 4 7/e(T)), as Ui — >■ 00, where 

l^e(r) = { 'llt'J i _ (29) 

and 



r'(ir+i-2r^-ir'-r') , . / ^ / 1 
4 7.e(r) = <; , 3(.+i)-^ 7/0<r<l, ^g^^ 

3(r+l)^ «/t>1. 



See Appendix 1 for the proof and Figure [5] for the plots of /ie(T) and ^^^(t). Notice that limT-^.oo i'e(T) = 
0, so the CLT result fails for r = oo. Furthermore, limT-_j.o i^e(T) = 0. For r = 1, we have iJ,e(T = 1) = 3/4, 
and 4 i'e(T = 1) = 1/24, hence as r 1, the distribution of p^.^ (r, c) converges to the one in Theorem 14.81 for 

i G {1, 777 + 1}.. The sharpest rate of convergence in Theorem l4.9l is K . ^"^"^-^ (explicit form not presented) 

for i g {1, 777 + 1} and is minimized at r ss 0.58 which is found numerically as before. We also checked the 
plot of /ie (r) / a/ Ve(T)'^ (uot prcscntcd) and verified that this is where the global minimum is attained. 
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Figure 5: The plots of the asymptotic mean [leir) (left) and the variance 4 vJyT) (right) for the end intervals 
as a function of t for r g (0, 10]. 

5 The Distribution of the Relative Density of (52)-random 
^n,m(^, c)-digraphs 

In this section, we consider the more challenging case of m > 2. 



5.1 First Version of Relative Density in the Case of m > 2 

Recall that the relative density Pn,m{T^ c) is defined as in Equation ([S]). Letting Wi — (?/(i-|_i) — y{i)) / {^2 — ^1): 
for i = 0, 1, 2, ... , TO, we obtain the following as a result of Theorem 14.71 

Theorem 5.1. Let Xn be a random sample from U{5i,52) with —00 < Si < S2 < 00 and y„i be a set of 
TO distinct points in {61,62)- For t G (0,oo), the asymptotic distribution of pn^miT, c) conditional on 3^,„ is 
given by 

Vn {pn.m{T, c) - /i(TO, T, c)) ^ Af (0, 4 ty{m, r, c)) , (31) 

as n oo, provided that i'{m,T,c) > 0, where fi{m,T,c) = 'jl{m,T,c) j {^2^=\^ "^1^ with Jl(m,T,c) = 
^('''i c) X)"=2 '"'f + Me(''') X)ie{i m+i} ^? '^^'^ c) and ^ie{T) are as in Theorems \4- 7| and \4-9\ respectively. 
Furthermore, Ai'{m,T,c) = 4:T'{m,T,c) j (^X^S^ ""^i ) 4i?(TO,r, c) = \P2N + '^Png + P2g\Y^T=2^^ + 

[P2N,e + 2 PNG,e + P2G ,e] Eie{l,m-Hl} " (/^("^^ ^' c))^ . 

Proof is provided in Appendix 2. Notice that if y(^i-^ = 6i and ?/(,„) ~ 62, there are only in — 1 middle 
intervals formed by yj^j. That is, the end intervals Ti — Im+i = 0- Hence in Theorem 15.11 jl{m,T,c) = 
h{t,c) since Jl{m,T,c) = fJ-{T,c)Y^'ll2'^i- Furthermore, 4i^(to,t, c) = [P2N + "2. Png + P2G]Yl'iL2^i ~ 
(Kr, c) E™ 2 - 4 K™, r, c) + i?{r, c) {YZ2 ^! " (E" 2 )') • 



5.2 Second Version of Relative Density in the Case of m > 2 

For TO > 2, if we consider the entire data set Xn, then we have n vertices. So we can also consider the relative 
density as Pn,m{T,c) = \A\ /{n (n - 1)). 
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Theorem 5.2. Let Xn be a random sample from U{Si, S2) with —00 < Si < S2 < 00 and y„i be a set of m 
distinct points in {61,62)- For t G (0, 00), the asymptotic distribution for Pn,miT,c) conditional on ym is 
given by 

VniPn,m{T,c) - ]l{m,T,c)) ^ TV (0, 4 ?(m, T, c)) , (32) 
as n — > oo, provided that v(jn,T,c) > 0, where ]l{m,T,c) and i/{m,T,c) are as in Theorem \5.1\ 

Proof is provided in Appendix 2. Notice that the relative arc densities, Pn,m{T,c) and 'pn,m{j,c) do 
not have the same distribution for neither finite nor infinite n. But we have pn m{T, c) = miT, c) 

and since for large and n, X^I^Ti^ ^n{ri-i) ^ Stii^ '"'f < 1, it follows that JK^m^TjC) < jl{m,T,c) and 
v{m,T,c) < v{m,T,c) for large and n. Furthermore, the asymptotic normality holds for Pn.m(j,c) iff it 
holds for p„,m(T, c). 

6 Extension of Central Similarity Proximity Regions to Higher 
Dimensions 

Note that in M the central similarity PCDs are based on the intervals whose end points are from class 3^. This 
interval partitioning can be viewed as the Delaunay tessellation of M based on y^n- So in higher dimensions, 
we use the Delaunay tessellation based on y,n to partition the space. 

Let y^n = {yi, y2, . . . , Ym} be m points in general position in R'' and Ti be the z*'* Delaunay cell for 
i ~ 1,2,..., J„j, where Jm is the number of Delaunay cells. Let X„ be a set of iid random variables from 
distribution F in with support S{F) C CH{ym) where Cniym) stands for the convex hull of y„i. 



6.1 Extension of Central Similarity Proximity Regions to 

For illustrative purposes, we focus on where a Delaunay tessellation is a triangulation, provided that no 
more than three points in ym are cocircular (i.e., lie on the same circle). Furthermore, for simplicity, we 
only consider the one Delaunay triangle case. Let 3^3 = {yi,y2,y3} be three non-coUinear points in and 
2^(3^3) = T(yi, y2, Vs) be the triangle with vertices 3^3. Let Xn be a set of iid random variables from F with 
support S{F) c r(3;3). 

For the expansion parameter r S (0,cx)], define N[x,t, Mq) to be the central similarity proximity map 
with expansion parameter r as follows; see also Figure[6l Let ej be the edge opposite vertex for j = 1, 2, 3, 
and let "edge regions" i?_B(ei), Re{^2), FLE[ez) partition ^(3^3) using hue segments from the center of mass 
of r(3^3) to the vertices. For x € (T(3^3))°, let e{x) be the edge in whose region 2; falls; x G i?_B(e(x)). If 
X falls on the boundary of two edge regions we assign e{x) arbitrarily. For r > 0, the central similarity 
proximity region N{x,t, Mc) is defined to be the triangle Tcs{x, t) n T(3^3) with the following properties: 

(i) For T G (0,1], the triangle Tcs{x,t) has an edge er{x) parallel to e{x) such that d{x,er{x)) ~ 
T d{x,e{x)) and d{er{x), e{x)) < d{x,e{x)) and for t > 1, d{er{x),e{x)) < d{x,er{x)) where d{x,e{x)) 
is the Euclidean distance from x to e(x), 

(ii) the triangle Tcsix^r) has the same orientation as and is similar to r(3^3), 

(iii) the point x is at the center of mass of Tcs{x,t). 
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Note that (i) implies the expansion parameter r, (ii) imphes "similarity" , and (iii) implies "central" in the 
name, (parameterized) central similarity proximity map. Notiee that r > implies that x € N(x,t, Mc) 
and, by construction, we have N{x,t,Mc) C T(3^3) for all x S T(3^3). For x e 9(r(3^3)) and t e (0,oo], we 
define N{x,t, Mc) = {x}. For all x <E T(3^3)° the edges er{x) and e{x) are coincident iff t = 1. Note also 
that hm^_,oo N{x, r, Mc) Tiy^) for all x e (T(3^3))°, so we define 7V(x, oo, Mc) = ^(^3) for all such x. 




Figure 6: Construction of central similarity proximity region, N{x,t ~ 1/2, Mc) (shaded region) for an 
X € Rsies) where /12 = d{x,el{x)) = ^ d{x,e{x)) and hi = d{x,e{x)).. 



6.2 Extension of Central Similarity Proximity Regions to with d>2 



The extension to E** for d > 2 with M = Mc is provided in (jCevhan and Priebd (|2005f )). the extension 
for general M is similar: Let y = {yi, y2, . . . , Vd+i} be d + 1 non-coplanar points. Denote the simplex 
formed by these d+1 points as &{yd+i)- The extension of iV^^ to R"^ for d > 2 is straightforward. Let 
y = {yi, y2, • • • , yd+i} be 0?+ 1 points in general position. Denote the simplex formed by these d+ 1 points as 
6(3^d_l_i). (A simplex is the simplest polytope in M.'^ having d+1 vertices, c? (d+ l)/2 edges and d+1 faces of 
dimension (d — 1).) For r G [0, 1], define the central similarity proximity map as follows. Let ipj be the face 
opposite vertex yj for j = 1, 2, . . . ,d+ 1, and "face regions" R{(pi), . . . , R{(pd+i) partition &{yd+i) into d+1 
regions, namely the d+1 polytopes with vertices being the center of mass together with d vertices chosen 
from d+ 1 vertices. For x G ©(3^d+i) \3^, let ip{x) be the face in whose region x falls; x £ R{(p{x)). (If x falls 
on the boundary of two face regions, we assign ip{x) arbitrarily.) For r S (0, 1], the r-factor central similarity 
proximity region N{x,t, Mc) = N'^{x) is defined to be the simplex &t{x) with the following properties: 



(i) &Tix) has a face iprix) parallel to (p{x) such that t d(x, (p{x)) ~ d((pT-{x), x) where d{x,ip(x)) is the 
Euclidean (perpendicular) distance from x to ip{x) , 

(ii) &rix) has the same orientation as and is similar to &{yd+i), 

(iii) X is at the center of mass of 6r{x). Note that r > 1 implies that x E N{x, r, Mc)- 



For T = 0, define N{x, t, Mc) = {x} for aU x e 6(3^d+i). 



Theorem 14. II generalizes, so that any simplex © in M.'^ can be transformed into a regular polytope (with 
edges being equal in length and faces being equal in volume) preserving uniformity. Delaunay triangulation 
becomes Delaunay tessellation in M"^, provided no more than d + 1 points being cospherical (lying on the 
boundary of the same sphere). In particular, with d = 3, the general simplex is a tetrahedron (4 vertices, 
4 triangular faces and 6 edges), which can be mapped into a regular tetrahedron (4 faces are equilateral 
triangles) with vertices (0, 0, 0) (1, 0, 0) (1/2, V3/2, 0), (1/2, ^3/6, V6/3). 
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Asymptotic normality of the JJ-statistic and consistency of the tests hold for d > 2. 



7 Discussion 



In this article, we consider the relative density of a random digraph family called central similarity proximity 
catch digraph (PCD) which is based on two classes of points (in R) . The central similarity PCDs have an 
expansion parameter r > and a centrality parameter c S (0, 1/2). We demonstrate that the relative density 
of the central similarity PCDs is a {/-statistic. Then, applying the central limit theory of the {/-statistics, 
we derive the (asymptotic normal) distribution of the relative density for uniform data for the entire ranges 
of T and c. We also determine the parameters t and c for which the rate of convergence to normality is the 
fastest. 



We can apply th e relative density in testing one dimensional bivariate spatial point patterns, as done in 



Cevhan et al.l (|2007t ) for two-dimensional data. Let X and y be two classes of points which lie in a compact 
interval in K. Then our null hypothesis is some form of complete spatial randomness of X points, which 
implies that distribution of X points has a uniform distribution in the support interval irrespective of the 
distribution of the y points. The alternatives are the segregation of X from y points or association of X points 
with y points. In general, association is the pattern in which the points from the two different classes occur 
close to each other, while segregation is the pattern in which the points from the same class tend to cluster 
together. In this context, under association, X points are clustered around y points, while under segregation, 
X points are clustered away from the y points. Notice that we can use the asymptotic distribution (i.e., 
the normal approximation) of the relative density for spatial pattern tests, so our methodology requires 
number of X points to be much larger compared to the number of y points. Our results will make the 
power comparisons possible for data from large families of distributions. Moreover, one might determine 
the optimal (with respect to empirical size and power) parameter values against segregation and association 
alternatives. 



The central similarity PCDs for one dimensional data can be used in classification as outlined in lPriebe et al 



( 2003a[) . if a high dimensional data set can be projected to one dimensional space with unsubstantial informa- 



tion loss (by some dimension reduction method). In the classification procedure, one might also determine 
the optimal parameters (with respect to some penalty function) for the best performance. Furthermore, 
this work forms the foundation of the generalizations and calculations for uniform and non-uniform cases in 
multiple dimensions. See Section|6]for the details of the extension to higher dimensions. For example, in R^, 
the expansion parameter is still t, but the centrality parameter is M = (7711,7712), which is two dimensional. 
The optimal parameters for testing spatial patterns and classification can also be determined, as in the one 
dimensional case. 
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APPENDIX 1: Proofs for the One Interval Case 



Proof of Theorem 14. 5t 

Depending on the location ol xi, the following are the different types of the eombinations of N{xi, 1, c) and 
ri(a;i, l,c). 

(i) for < xi < c, we have N{xi, 1, c) = (0, xi/c) and ri(a::i, 1, c) — (cxi, (1 — c) xi + c), 

(ii) for c < xi < 1, N(xi, 1, c) = {{xi — c)/(l — c), 1) and ri(xi, 1, c) = {cxi,{l — c) xi + c). 

Then m(1,c) = P(X2 £ iV(Xi, l,c)) = + - ^)dxi = 1/2. 

For Cov(/ii2, /lis), we need to caleulate P2N, Png, and P2G- 

P2N = P{{X2,X3} C N{Xi,l,c)) ^ (^^Ydxi+J^ (l-y^) ^^2:1= 1/3. 



PjvG = P(^2 e iV(Xi,i,c),X3 e ri(Xi,i,c)) = 

^{l + c~2cxi)dxi+ ^1- y— (l + c-2ca:i)da;i :=-c73 + c/3+l/6. 

Finally, P2G = P({X2,X3} C ri(Xi, l,c)) = /„\l + c - 2cxi)2dxi = cVS - c/3 + 1/3. 

Therefore AEihuhis] = P2N + 2Png + P2G = -cV3 + c/3 + 1. Hence 4i/(l,c) = 4 Cov[/ii2, /113] = 
c(l-c)/3. ■ 

Proof of Theorem 14. 6t 

There are two cases for r, namely < r < 1 and t > 1. 

Case 1: < r < 1: In this case depending on the location of a;i, the following are the different types of the 
combinations of iV(a;i, r, 1/2) and ri(a;i, r, 1/2). 

(i) for < xi < (1 - r)/2, we have iV(a;i, r, 1/2) = (xi (1 - t),xi (1 + r)) and ri(xi,T, 1/2) (xi/(l + 
r), 2:1/(1 -r)), 

(ii) for (1 - r)/2 < xi < 1/2, we have iV(xi,T, 1/2) = {xi {1 - r),a;i(l + r)) and ri(a;i, r, 1/2) 
(a;i/(l + T),(xi+T)/(l + r)). 

Then Ai(T,l/2) = P(X2 G A^(Xi, r, 1/2)) = 2P{X2 G A^(Xi, r, 1/2), Xi e (0,1/2)) by symmetry and 

/•1/2 /-i/a 

P(X2 eiV(Xi,T, 1/2), XiG (0,1/2)) 
So Ai(T,l/2) = 2(t/4) = r/2. 



/ {xi (1 + t) — xi (1 — T))dxi = I 2xi rdxi = t/4. 
Jo Jo 
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For Cov(/ii2, /113), wc need to calculate P2N, Png, and P2g- 

P2N = P({X2, X3} C iV(Xi, T, 1/2)) = 2 P({X2, X3} C iV(Xi, T, 1/2), e (0, 1/2)) 

and 

.1/2 

P({X2, Xa} C N{Xi,T, 1/2), Xi e (0, 1/2)) = / {2xi rfdxi = tV6. 

Jo 

So P2N = 2 (t2/6) tV3. 



P/VG = P{X2 G A^(Xi,r,l/2),X3 e ri(Xi,r,l/2)) = 

2 P(X2 e N{Xi , T, 1/2), X3 e Ti (Xi, r, 1/2), Xi e (0, 1/2)) 



and 



P{X2 e A^(Xi,T,i/2),X3e ri(Xi,T,i/2),Xi e (0,1/2) 



(2 .tit) 
r-(i"T)/2 



1 - T 1 + T 

2 xi r 









(2a;ir)( 




./(l-r)/2 \ 



xi + r xi 



/ (2xit) / 

./O V^^^-/ "'(1-t)/2 



(2xir) 



1 + r 1 + r 

T 

1 + T 



dxi = 

2 + 2t-t^)t^ 
12(r+ 1) ■ 



So Png 



(2+2t-t^)t 
6(t + 1) 



Finally, 

P2G = Pi{X2,X3} c ri(Xi,T,i/2)) = 2P({X2,X3} c ri(Xi,T,i/2),Xi e (0,1/2)) 

and 



P({X2, X3}cri(Xi,T, 1/2), XlG (0,1/2))= / (_^) d.Tl+/ (——]dxi=^^ 



Therefore 4E[/ii2/ii3] = P2W + 2PArG + P2G = ■ Hence 4i.(t, 1/2) = 4Cov[/ii2, /113] = 

r^(-r^-r^+2T+l) 
3(r+l)^ ■ 

Case 2: r > 1: In this case depending on the location of xi, the following are the different types of the 
combinations of N{xi, r, 1/2) and ri(.Ti, r, 1/2). 

(i) for < xi < 1/(1 + t), we have iV(xi,T,l/2) = (0, a;i (1 + t)) and ri(a;i, r, 1/2) = (a;i/(l + r), (.xi + 
r)/{l + r)), 

(ii) forl/(l+T) <xi< l/2,wehaveiV(xi,T,l/2) = (0, 1) and ri(xi, r, 1/2) = {xi/{1+t), {xi+t)/{1+t)), 



Then Ai(T,l/2) = P{X2 e N{Xi,t,1/2)) ^2P{X2 G iV(Xi, r, 1/2), Xi G (0,1/2)) by symmetry and 

»l/(l+r) ^1/2 

'l/(l+T 



P(X2 eiV(Xi,r, 1/2), XiG (0,1/2))= / xi (1 + r)da-i + / Mxi = . 

Jo "'l/(l+r) 2(t + 1) 
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So ;,(r,l/2) = 2(^)=^. 
Next 

P2N = P{{X2, ^3} C N{XuT, 1/2)) = 2 P{{X2, X3} C N{XuT, 1/2), Xi G (0, 1/2)) 

and 

1/(1+t) _ /.1/2 1-3t 

Idxi 



P({X2, X3} C N{X^,T, 1/2), Xi e (0, 1/2)) ^ / (zi (1 + T)fdxi + / 

Jo Ji 

So P2W = 2 (sTTTiy) = a^r+T) ■ 



l/(l+r) 6(t+1)' 



PjVG = P(^2 e (^1, T, 1/2), X3 € Ti (Xi , T, 1/2)) = 

2P(X2 G7V(Xi,T,l/2),X3eri(Xi,T,l/2),Xi € (0,1/2)) 



and 



P{X2 e iV(Xi,r,i/2),X3e ri(Xi,T,i/2),Xi e (0,1/2) = 

nl/(l+T) .1/2 



/ (.Tl(l+T))(T/(1+T))dxi+ / (T/(l+T))d.Tl = 

JO Jl/(l+r) 



l/(l+r)' " " 2(1+t)2- 

2 

So Png = ■ 

Finally, 

P2G = P({X2,X3} c ri(Xi,T,i/2)) = 2P({X2,X3} c ri(Xi,T,i/2),Xi e (0,1/2)) 

and 



.1/2 

P({X2, X3}cri(Xi,T, 1/2), XiG (0,1/2))== / (t/(1 

JO 



T)Ydx 



2(1 + t)2 



So P2G = 

Therefore 4 E[/ii2/ii3] = P2W + 2 Pjvg + ^2G - ^%r+if^ - Hence 4 z/(t, 1/2) = 4 Cov[/ii2, M = ^f^- 



Proof of Theorem 14. 7t 

First we consider < c < 1/2. There are two cases for r, namely < r < 1 and r > 1. 

Case 1: < r < 1: In this case depending on the location of xi, the following are the different types of the 

combinations of N{xi,t, c) and ri(a;i, r, c). Let ai := Xi (1 — r), 02 := Xi (1 + ^^^—^^), 03 := Xi — ^-^^^p-^, 

a4 := XI + (1 - :ri) r, and 51 := ff2 := ^7' ^3 := 34 := "'tZ'+cV' ■ Then 

(i) for < .Ti < c (1 — r), we have N{xi,t, c) = (oi, 02) and ri(xi, r, c) = (51, 52), 

(ii) for c (1 — r) < xi < c, we have N{xi, r, c) = (oi, 02) and ri(xi, r, c) = (51, 54), 

(iii) for c < xi < c (1 — r) + t, we have N{xi,t, c) = (03, 04) and ri(a;i, r, c) = (ffi, 54), 
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(iv) for c (1 — t) + r < xi < 1, wc have N(xi,t, c) — (03, 04) and ri(xi, r, c) — (33, (74)- 



Then ^(t,c) = P{X2 e iV(Xi,T,c)) /J'Caa - ai)dxi + J^ia^ - a3)dxi = t/2. 
For Cov(/ii2, /113), we need to calculate P2N, Png, and P2G- 

P2Ar = P({X2,X3} CiV(Xi,T,c)) = / ia2~aifdxi+ [ (a^ ^ as)^ dxi ^ 3. 

Jo Jc 



fC(l-T) 

PNG = PiX2eN{Xi,T,c),X3eTi{Xi,T,c))= / (02 - ai) (32 - .91)^2-1 + 

Jo 

{a2 - ai){g4 ~ gi)dxi + {a^ - a:i){gi - gi)dxi + I {04 - a3){g4 - g3)dxi = 

c(1-t) Jc Jc(1-t)+t 

(c^ - 5 - c + 4 r + 5 c - 2 - 4 c r - + 2 c + 2 t) 

6 {CT — C + 1) {c -\- T — Ct) 

Finally, 

^c(l-r) 



P2G-^^({^2,^3}Cri(Xi,T,c))= / (32 - + 

Jo 

(34 - 31) dxi + (34 - 33) dxi = —- r-rvr~ T 

Jc(l-r)+T 3 (CT-C+I)(c + T-Cr) 



Therefore 

(c^r^ - Gc^T^ - cT^ + Sc^r + 6ct2 - 4c2 - 8cr- +4c + 4r) 



4E[/li2/li3] =P2W + 2PwG + ^2G 



3 [CT — C + 1) [c + T — Ct) 



Hence4Ki(T,c) =4Cov[/ii2>i3] = ^ 3 (cr-c+i)(c+r-cr) -■ 

Case 2: r > 1: In this case depending on the location of xi, the following are the different types of the 
combinations oi N{xi,t,c) and ri(xi, r, c). 

(i) for < xi < e+(i'lc)r ' ^^■^'^ 7V(a;i,T, c) = (0,02) and ri(.Ti, t,c) = (31,34), 

(ii) for < 2:1 < jTif^, we have 7V(xi,r, c) = (0, 1) and Ti[xi,t,c) = (31,34), 

(iii) for < xi < 1, we have iV(xi,r, c) = (03,1) and ri(.Ti, r,c) = (31,34)- 

Then 



//(t,c) = F(X2 e 7V(Xi,T,c)) = / 02^x1+ / lda;i+ / (1 - a3)da;i 



r (2c2r-2c2-2cT + 2c-l) 
2 (cr — c+l)(cr — c — r) 
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Next 

P2N = P({X2,X3} C 7V(Xi,T,c)) - 



Idxi + 



(1 - asfdxi 



+ (l-c) -r " 1-c + c 

Sc^T^ - 2c2t- 3ct2 - c2 + 2cT + c - r 
3 (cr — c+l)(cr — c — r) 



Png = ^^(^2 e iV(Xi,T,c),X3 e ri(Xi, T,c)) = 

/ a2 (gi - gi)dxi + {gi~gi)dxi+ {1 - a^) {g4: - gi)dxi = 

[r2(6c^ r"* - 24 - 18 + 36 + 72 + 18 c"* r"' - 24 t - 108 r2-84c'' r^-Gc^ t'^ + 6c^^+ 
72c'V+132c* t2+48 - 18 - 92 c^'r- 84 t^~12c^ t3 + 26c^+64 cV+30 r2-22 0^-26 r-6 ct2+ 

10c2 + 6cr-2c - t)]/[6 (cr - c + 1)^ (cr - c - r)^]. 

Finally, 

P2G = P{{X2,X3} cri{Xi,T,c)) ^ f {g4-gi)^dxi = 

(3 c'* r^ - 6 c'' r - 6 c3 r2 + 3 c^ + 12 c3 r + 3 c^ r^ - 6 c3 - 9 c2 r + 7 c2 + 3 cr - 4 c + 1) r^ 

3 (cr — c + 1)^ (cr — c — r)^ 



Therefore 

4 E[/ii2/ii3] = P2W + 2 Ptvg + P2G = [12 c*' rf' - 50 c'^ - 36 c^ r^ + 79 c'^ r^ + 150 c'^ r^ + 36 c" r^ - 56 c^ r^- 
237 c^ r"- 175 c^ r^- 12 c^ r^ + 14 c^ r2 + 168 c^ r3+297 c"* r^ + 100 c^ r^+2 c^ r-42 c^ r2-220 c'' r^- 199 c^ r^- 
25 c^ r^-c^-6 c^ r+58 C' r^+lb (? r^+3 c^ r-46 c^ r2-70 c^ r3-15 c r^-3 c*-4 e" r+20 c^ r2+ 

18 c r^ + c^ + cV - 4 c r^ - 3 r^]/[3 (c r - c + 1)^ (c r - c - r)^] . 

Hence 

4k2(t,c) = 4Cov[/ii2,/ii3] = [c(l -c) (2cV^ - 7 cV - 4 cV^ + 8 cV^ ^_ UcV* + 3c2r5 - 2cV2- 
16 c^ r^-7c^ r^-cr^-2c^r+4c^ T^^rVlc^ r'''+c''+4 cV-6 c^ r^-4cr^-2c^-3cV+4cr2+c2+cr-r2)]/ 

[3 (cr — c + 1)'"^ (cr — c — r)'^]. 



For 1/2 < c < 1, by symmetry, it follows that pi.2ij, c) = //i(r, 1 — c) and V2(t^ c) = i^i(r, 1 — c). ■ 



Proof of Theorem 14. 8t 



Suppose i = TO + 1 (i.e., the support is the right end interval). For x\ G (0, 1), depending on the location of 
xi, the following are the different types of the combinations of N(.[x\, 1) and ri_e(a;i, 1). 



(i) for < XI < 1/2, we have 7Ve(xi, 1) = (0,2xi) and ri,e(a;i, 1) = (xi/2, 1), 
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(ii) for 1/2 < a-i < 1, Ne{xi, 1) = (0, 1) and ri,e(a;i, 1) = (xi/2, 1). 

Then ^e(l) = P{X2 e Ne{Xi, 1)) = J^^^ 2xidxi + J^^^ Idxi = 3/4. 
For Cov(/ii2, /lis), we need to calculate P2N, Png, and P2g- 

P2N = P{{X2,X3} C N,{Xi,l)) = / (2x1)2^X1 + / Idxi = 2/3. 

Jo Jl/2 
.1/2 .1 

PNG=P{X2eN,{Xul),X3eriAXi,l))= / (2.ti)(1 - .Ti/2)dxi + / 1(1 - 2;i/2)dxi = 25/48. 

Jo Jl/2 

Finally, P2G = P({X2,X3} C ri,e(^i, 1)) - /q (1 - a;i/2)2da;i = 7/12. 

Therefore 4 E[/ii2/ii3] = P2JV + 2Png + P2G ^ 55/24. Hence 4i/e(l) = 4 Cov[/ii2, /113] = 1/24. 

For uniform data, by symmetry, the distribution of the relative density of the subdigraph for i ~ 1 is 
identical to i = m + 1 case. I 

Proof of Theorem 14. 9t 

There are two cases for r, namely, < r < 1 and t > 1. 

Case 1: < r < 1: For xi E (0, 1), depending on the location of xi, the following are the different types 
of the combinations of Ne{xi,T) and Ti^e{xi,T). 

(i) for < xi < 1 — T, we have Ne{xi,T) = (a-i (1 — t), xi (l + r)) and ri^e(a;i, r) ~ (a;i/(l + t), a;i/(l — r)), 

(ii) for 1-r < xi < 1/(1 + t), we have iVe(a;i, t) = (xi {1-t),xi (1 + t)) and ri,e(.Ti, t) = (a;i/(l + t), 1), 

(iii) for 1/(1 + t) < a-i < 1, we have Ne{xi,T) = {xi (1 - r), 1) and ri.e(a;i, r) = {xi/{l + r), 1). 

Then 

.l/(l+r) .1 
/ie(T) = P(X2eiVe(Xi,r))- / (x, {1 + t) - Xi {1 ~ T))dxi + {1 - Xi {1 - T))dxi = 

Jq Ji/{1+t) 

r-l/d+r) ,1 TiT + 2) 







{2xiT)dxi+ / {1 - xi + xiT)dxi . 

Jl/(l+r) ^(T+lj 



For Cov(/ii2, /iia), we need to calculate P2N.e, Png.b, and P2G,^ 



.l/(l+r) .1 
P2N,e^P{{X2,X3}cN,{Xi,T))= / {2xiTfdxi + 

Jo Jl/(1 + T 



1/(1+^) /•! ^2 C^2 



(1 — xi + xit) dxi 



(T^ + 3r + 4) 



l/(l+r) 3(t+1)2 
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PNG,e = P{X2 e n,{x,,t),X3 e ri,e(Xi,T)) = ^ (2xir) (J^^ dxi+ 

(2x1 r) ( 1 - ) ^2^1+ / (1-r)) ( 1 - ) dxi = ^ ^ ^ 



1 + ^/ -/i/li+r) V 1 + 7-/ 6(r + l) 



Finally, 



Therefore 4 E[/ii2/ii3] = P2N,e + 2PNG,e + P2G.e = "'(^"'+y+^)(^"+^-"') . Hence 

, (4r + 4-2r** -4r3 -r^) 

4j.e(r) =4Cov[/ii2,M = ^ 



3(t + 1)' 

Case 2: t > 1: For xi G (0, 1), depending on the location of xi, the following are the different types of 
the combinations of Nf,{xi,T) and Fi e(xi,T). 

(i) for < xi < 1/(1 + t), we have iVe(xi,T) = (0,xi (1 + r)) and Fi^e(a;i,T) = (xi/(l + t), 1), 

(ii) for 1/(1 + r) < xi < 1, we have Ne{xi,T) = (0, 1) and Fi_e(a;i,r) = (xi/(l + r), 1). 

Then 

l + 2r 



Me(r)=P(X2eiVe(^l,T)) 



nl/{l+r) .1 

/ xi (1 + T)dxi + / Idxi 

Jo Ji/(1+t) 



2(t+1) 



Next, 



P2N,e = P{{X2,X3}CN,{XuT))^ (Xi(l+T))2dxi+ / ldxi = -— — . 

Jo Ji/{i+T) 3(t + 1) 



, / -^1 A , f \ , 6r3 + 12T2 + 6r+l 
xi 1 + r 1-— ^ dxi+ / 1--J_ dxi = 3 . 

V 1+^/ /l/(l+r)V 1+^/ 6(t+1)' 

Finally, 

3r2 + 3r + l 



P2G,e-P({^2,^3}CFi,e(Xi,T)) =^ dxi 



3(t + 1)2 



Therefore 4 E[/ii2/ii3] = P2N,e + 2PNG,e + P2G,e = "'^3^^+1)3 Hence 4i.e(T) = 4 Cov[/ii2, /113] 

3(T + l)a 
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APPENDIX 2: Proofs for the Multiple Interval Case 



We give the proof of Theorem 15.21 first. 



Proof of Theorem 15. 2t 

Recall that p„.m(T, c) is the relative arc density of the PCD for the m> 2 case. Then it follows that Pn.m{T, c) 
is a f7-statistic of degree two, so we can write it as Pn,m{T,c) = Si<j ^ij where hij = [gij + gji)/2. 

Then the expectation of Pn,m{T, c) is 

E[p„.,„(t,c)] = ^ 1 N 1] 1] E [h,j] = E[hi2] = E[5i2] = P{{Xi,X2) e A) = Ji{m,T,c). 
n(n — 1 ^ — ' ^ — ' 

But, by definition of N{-, r, c), if Xi and X2 arc in different intervals, then P((Xi, X2) £ -4) = 0. So, by the 
law of total probability, we have 

Ji{m,T,c) ■.^P{{XuX2)^A) = 

m+l 

p{{Xi,X2) e A\{Xi,X2} ci,)P{{Xi,X2} ci^) = 

i=l 

m 

^/x(t,c)P({Xi,X2}cI0+ E ^^e{r)P{{X,,X2}CI^) = 
j=2 z6{l,rn+l} 



1=2 je{l,rn+l} i=2 i£{i^rn+l} 

since P{X2 G N{Xi,t,c) \ {Xi,X2} C Ii) is ij.{t,c) for middle intervals and /ie(''') for the end intervals and 

Pi{x,,x2}czd = {^^^^0i^y 



2 

10, . 



Furthermore, the asymptotic variance is 

AV{m,T,c) =4E[/ii2/ii3] -E[/ii2]E[/ii3] 4 E [/ii2/ii3] - {jl{m,T,c))^ 
where 4E [/ii2^i3] = P2N + 2Pmg + P2G with 

m 

^'2iV = ^({^2, ^3} C iV(Xi, T, c) I {Xi, X2, X3} C li) X2, X3} C + 

i=2 

Y P({^2,X,} C iVe(Xl,T) I {Xi,X2,X3} CI0^'({^1,^2,^3} Cl.) 

ie{l,m+l} 
m 

^P2WP({^1,^2,^3} Cl.)+ Y ^2JV,eP({^l,^2,^3} Cl,;) « 
i=2 iG{l,rn+l} 



i=2 ie{l,7?i + l} 1=2 

since P({X2,X3} C N{Xi,t,c)\{Xi,X2tX^} C I,) is P2Ar for middle intervals and P2jv,e for the end 
intervals and P({Xi, X2, X3} C I,) = ( ^''^^i'" ) = "^i ' Similarly, 



i=2 'te{l,m+l} 
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and 

m 

ha - P2G + P2G,e wl 

i=2 ie{l,m+l} 

Therefore, 

m 

4i7(TO, T, C) = (PzAT + 2 Patg + P2g) Y + (-^2Ar,e + 2 PwG.e + -P2G,e) ^1 " ^' 

i=2 ie{l,in+l} 

Hence the desired resuh foUows. ■ 



Proof of Theorem I5.lt 

Recah that Pn.m{T, c) is the version I of the relative arc density of the PCD for the m > 2 case. Moreover, 
Pn,m{T,c) = "*'"~^'' prt,m('7", c) . Then the expectation of p„^,„(r, c), for large Ui and n, is 



E [/3„,m(T, c)] ^^^^^^^^^ 5^E[p„,™(r, c)] « /x(m, r, c) | ^ 



i=l 



since "^^ = (j2^i=i^ ni[ni — l)/(n(n — 1))^ « (^SS^ '^i) large ri^ and n. Here fi{m,T,c) is as 

in Theorem 15.21 



Moreover, the asymptotic variance of p„^,„(t, c), for large rii and n, is 
Av{m,T,c) — J Av{ra,T^c) ~ Av{ra^T^c) I 



i=l 



-2 

2 I 



smce ^ ^ 



i=i / \ j=i 



for large rii and n, Here z/(m, r, c) is as in Theorem 15.21 Hence the desired result follows 
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